GenomeCompress: A Novel Algorithm for DNA Compression
نویسندگان
چکیده
The genome of an organism contains all hereditary information encoded in DNA. So it is extremely important to sequence the genome which determines how the organisms survive, develop and multiply. Since three decades, due to massive efforts on DNA sequencing, complete genome sequence of a large number of organisms including humans are now known and the genomic databases are growing exponentially with time. Also for the huge size of the genomes, an efficient algorithm is required to compress them. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. DNA specific compression algorithms exploit the repetitiveness of bases in DNA sequences. A repetitive DNA sequence can be best compressed using dictionary based compression algorithm. Non-repetitive parts of the DNA are generally compressed using dynamic programming, by dividing the sequences in square matrices which contain common repeat of a single base and then substituting the matrix with the base and putting the order of the matrix in a string. In this paper, a novel algorithm for DNA compression is proposed in order to compress both repetitive and non repetitive DNA sequence. The algorithm is also compared with existing ones and is found to achieve better compression ratio than the others.
منابع مشابه
A Novel Color Image Compression Method Using Eigenimages
Since the birth of multi–spectral imaging techniques, there has been a tendency to consider and process this new type of data as a set of parallel gray–scale images, instead of an ensemble of an n–D realization. Although, even now, some researchers make the same assumption, it is proved that using vector geometries leads to better results. In this paper, first a method is prop...
متن کاملDNABIT Compress – Genome compression algorithm
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...
متن کاملParallelizing Assignment Problem with DNA Strands
Background:Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and ...
متن کاملImplementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey
Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...
متن کاملDetermining the Proper compression Algorithm for Biomedical Signals and Design of an Optimum Graphic System to Display Them (TECHNICAL NOTES)
In this paper the need for employing a data reduction algorithm in using digital graphic systems to display biomedical signals is firstly addressed and then, some such algorithms are compared from different points of view (such as complexity, real time feasibility, etc.). Subsequently, it is concluded that Turning Point algorithm can be a suitable one for real time implementation on a microproc...
متن کامل